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Abstract 

This work presents GROUSE (Grassmannian Rank-One Update Subspace Estimation), an 
efficient online algorithm for tracking subspaces from highly incomplete observations. GROUSE 
requires only basic linear algebraic manipulations at each iteration, and each subspace update 
can be performed in linear time in the dimension of the subspace. The algorithm is derived 
by analyzing incremental gradient descent on the Grassmannian manifold of subspaces. With 
a slight modification, GROUSE can also be used as an online incremental algorithm for the 
matrix completion problem of imputing missing entries of a low-rank matrix. GROUSE performs 
exceptionally well in practice both in tracking subspaces and as an online algorithm for matrix 
completion. 

1 Introduction 

The evolution of high-dimensional dynamical systems can often be well summarized or approxi- 
mated in low-dimensional subspaces. Certain patterns of computer network traffic including origin- 
destination flows can be well represented by a subspace model [13]. Environmental monitoring of 
soil and crop conditions |10j , water contamination [15] , and seismological activity |18] have all been 
demonstrated to be efficiently summarized by very low-dimensional subspace representations. 

The subspace methods employed in the aforementioned references are based upon first collecting 
full-dimensional data from the systems and then using approximation techniques to identify an 
accurate low-dimensional representation. However, it is often difficult or even infeasible to acquire 
and process full-dimensional measurements. For example, collecting network traffic measurements 
at a very large number of points and fine time-scales is impractical. An alternative is to randomly 
subsample the full-dimensional data. If the complete full-dimensional data is well-approximated 
by a lower dimensional subspace, and hence is in some sense redundant, then it is conceivable that 
the subsampled data may provide sufficient information for the recovery of that subspace. This is 
the central intuition of our proposed on-line algorithm for identifying and tracking low-dimensional 
subspaces from highly incomplete (i.e., undersampled) data. Such an algorithm could enable rapid 
detection of traffic spikes or intrusions in computer networks [13] or could provide large efficiency 
gains in managing energy consumption in a large office building [8]. 
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In this paper, we present GROUSE (Grassmannian Rank-One Update Subspace Estimation), a 
subspace identification and tracking algorithm that builds high quality estimates from very sparsely 
sampled vectors. GROUSE implements an incremental gradient procedure with computational 
complexity linear in dimensions of the problem, and is scalable to very high-dimensional appli- 
cations. An additional feature of GROUSE is that it can be immediately adapted to an 'online' 
version of the matrix completion problem, where one aims to recover a low-rank matrix from a 
small, streaming random subsets of its entries. GROUSE is not only remarkably efficient for online 
matrix completion, but additionally enables incremental updates as columns are added or entries 
are incremented over time. These features are particularly attractive for maintaining databases of 
user preferences and collaborative filtering. 

2 Problem Set-up 

We aim to track an d-dimensional subspace of M" that evolves over time, denoted by S[t]. At every 
time t, we observe a vector vt G S[t] at locations fit C {1, . . . n}. Let Vut denote the \Qt \ matrix 
that selects the coordinate axes of M" indexed by Oj. That is, we observe VntVt-, t = 1,2, ... . 
We will measure the error of our subspace using the natural cost function: the squared Euclidean 
distance from our current subspace estimate S'est \t\ to the observed vector vt only on the coordinates 
revealed in the set in ^t'- 

We can compute F{S] t) explicitly for any subspace S. Let U be any matrix whose columns span 
S. Let C/f^t denote the submatrix of U consisting of the rows indexed by For a vector v G M", 
let vq,^ = VQ,^{vt) denote a vector in ]RI^*I whose entries are indexed by il^. Then we have 

^(5; t) = min IIC/qjO - untip (1) 

a 

We will use this definition in Section [s] to derive our algorithm. If the matrix Uq^Uq^ has full rank, 
then we must have that w = {U^Ua^)~^U'^^VQ,^ achieves the minimum in (jl|. Thus, 

F[S-t) = vliI-Un,{UlUnJ-'ul)vn, . 

In the special case where the subspace is time- invariant, that is S[t] = So for some fixed subspace 
Sq, then it is natural to consider the average cost function 

T 

F{S) ■.= Y,dist{rnAS),rnAvt)f . (2) 
t=l 

The average cost function will allow us to estimate the steady-state behavior of our algorithm. 
Indeed, in the static case, our algorithm will be guaranteed to converge to a stationary point of 
F{S). 

2.1 Relation to Matrix Completion 

In the scenario where the subspace does not evolve over time and we only observe vectors on a 
finite time horizon, then the cost function (12]) is identical to the matrix completion optimization 
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problem studied in [TTl [6]. To see the equivalence, let Q = {{k,t) : k £ ^Itl < t < T}, and let 
V = [vi, . . . , vt] ■ Then 

T 

F{S) = y^min \\UQ^a - u^tlP 

t=l 

= min y {UA-V)l 

That is, the global optimization problem can be written as minjy^A j)gQ(C^^ — which is 
precisely the starting point for the algorithms and analyses in |1H |6] . The authors in use a 
gradient descent algorithm to jointly minimize both U and A while [6] minimizes this cost function 
by first solving for A and then taking a gradient step with respect to U. In the present work, we 
consider optimizing this cost function one column at a time. We show that by using our online 
algorithm, where each measurement vt corresponds to a random column of the matrix V, we achieve 
state-of-the-art performance on matrix completion problems. 



3 Stochastic Gradient Descent on the Grassmannian 

The set of all subspaces of M" of dimension d is denoted &{n,d) and is called the Grassmannian. The 
Grassmannian is a compact Riemannian manifold, and its geodesies can be explicitly computed [7]. 
An element S £ (S{n,d) can be represented by any n x d matrix U whose columns form an 
orthonormal basis for S. Our algorithm derives from an application of incremental gradient descent 
on the Grassmannian manifold. We first compute a gradient of the cost function F, and then follow 
this gradient along a short geodesic curve in the Grassmannian. 

We follow the program developed in [7]. To compute the gradient of F on the Grassmannian 
manifold, we first need to compute the partial derivatives of F with respect to the components of 
U. For a generic subspace, the matrix U^JJ^^ has full rank provided that \Vtt\ > d, and hence the 
cost function (fl]) is differentiable almost everywhere. Let Aq^ be the n x n diagonal matrix which 
has 1 in the j diagonal entry if j £ $7^ and has otherwise. We can rewrite 

F{S;t)=mm\\An,{Ua-vt)f 

a 

from which it follows that the derivative of F with respect to the elements of U is 

— = -2iAnAvt - Uw))w^ 

= -2rvF (3) 

where r := A^^(yt — Uw) denotes the (zero padded) residual vector and w is the least-squares 
solution in ([l]). 

Using Equation (2.70) in [7], we can calculate the gradient on the Grassmannian from this 
partial derivative 



-2(1 - UU'^)rw'^ = -2rw 



T 
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The final equality follows because the residual vector r is orthogonal to all of the columns of U . 
This can be verified from the definitions of r and w. 

A gradient step along the geodesic with tangent vector —VF is given by Equation (2.65) in [7], 
and is a function of the singular values and vectors of V-F. It is trivial to compute the singular 
value decomposition of VF, as it is rank one. The sole non-zero singular value is cr = 2||r||||t(;|| and 
the corresponding left and right singular vectors are and respectively. Let rE2, . . . , be an 
orthonormal set orthogonal to r and ?/2) ■ • • ; yd be an orthonormal set orthogonal to w. Then 



rp 

-2rw 



jPU X2 ... Xd 



X diag((j, 0, . . . , 0) X 



M\ ^2 ••• Vd 



forms an SVD for the gradient. Now using (2.65) from [7], we find that for r/ > 0, a step of length 
r] in the direction VF is given by 

Uiv) = U+ ^^^^^Plf^Uwu^^ + sin(a,)^^ 

ll^ll IfII II^II 

U + ( sm{ar]) TT—rr + {cos{ar]) — V 



\r\\ \\P\\ / \\w\\ 

where p := Uw, the predicted value of the projection of the vector v onto 5. 

This geodesic update rule is remarkable for a number of reasons. First of all, it consists only 
of a rank-one modification of the current subspace basis U. Second, the term = ^^^^J^^ [g 

on the order of rj when ar] is small. That is, for small values of a and r] this expression looks 
like a normal step along the gradient direction —2rw^ given by (|3|). From the Taylor series of the 

— T 

cosine, we see that the second term is approximately equal to a'^rf ■ That is, this term serves 

as a second order correction to keep the iterates on the Grassmannian. Surprisingly, this simple 
additive term maintains the orthogonality, obviating the need for orthogonalizing the columns of U 
after a gradient step. Below, we will also discuss how this iterate relates to more familiar iterative 
algorithms from linear algebra which use full information. 

The GROUSE algorithm simply follows geodesies along the gradients of F with a prescribed set 
of step-sizes rj. The full computation is summarized in Algorithm [TJ Our derivations have shown 
that computing a gradient step only requires the solution of the least squares problem ([T]), the 
computation of p and r, and then a rank one update to the previous subspace. 

Each step of GROUSE can be performed efficiently with standard linear algebra packages. 
Computing the weights in Step [2] of Algorithm [T] requires solving a least squares problem in \Vlt\ 
equations and d unknowns. Such a system is solvable in at most 0{\Q.t\d'^) flops in the worst case. 
Predicting the component of v that lies in the current subspace requires a matrix vector multiply 
that can be computed in 0{nd) flops. Computing the residual then only requires 0(|ilt|) flops, as 
we will always have zeros in the entries indexed by the complement of Vtf Computing the norms of 
r and p can be done in 0{n) flops. The final subspace update consists of adding a rank one matrix 
to an n X d matrix and can be computed in 0{nd) flops. Totaling all of these computation times 
gives an overall complexity estimate of 0{nd+ \Q.t\d'^) flops per subspace update. 
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Algorithm 1 Grassmannian Rank-One Update Subspace Estimation 

Require: An n x d orthogonal matrix Uq. A sequence of vectors vt, each observed in entries ^If 
A set of stepsizes r]t . 
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for t = 1, . . . ,r do 



2: Estimate weights: w = argmin^ ||Ant(f/fa — vt)\\ 

3: Predict full vector: p = UtW 

4: Compute residual: r = A^^^vt — p) 

5: Update subpace: Ut+i = Ut + (^{cos{ar]t) - + sin((T??f) 

where a = \\r\\ \\p\\ 

6: end for 



3.1 Step-sizes 

In the static case where the subspace S[t] = Sq for all t, we can guarantee that the algorithm 
converges to a stationary point of ^ as long as the stepsizes rjt satisfy 

oo 

lim r]t = and > rjt = oo 

fc— >oo ^ — ' 

k=l 

Selecting rjt oc 1/t will satisfy this assumption. This analysis appeals to the classical ODE 
method [12], and we are guaranteed such convergence because (5{n,d) is compact. 

In the case that S[t] is changing over time, a constant stepsize is needed to continually adapt 
to the changing subspace. Of course, if a non-vanishing stepsize is used then the error will not 
converge to zero, even in the static case. This leads to the common tradeoff between tracking rate 
and steady-state error in adaptive filtering problems. We explore this tradeoff in Section [4| 

3.2 Comparison to methods that use full information 

GROUSE can be understood as an adaptation of an incremental update to a QR or SVD factoriza- 
tion. Most batch subspace identification algorithms that rely on the eigenvalue decomposition, the 
singular value decomposition, or their more efficient counterparts such as the QR decomposition or 
the Lanczos method, can be adapted for on-line updates and tracking of the principal subspace. A 
comprehensive survey of these methods can be found in [5] . 

Suppose we fully observe the vector v at each increment. Given an estimated basis, Uest for the 
unknown subspace S, we would update our estimate for 5 by computing the component of v that 
is orthogonal to U. We would then append this new orthogonal component to our basis U, and use 
an update rule based on the magnitude of the component of v that does not lie in the span of U. 
Brand [2] has shown that this method can be used to compute an efficient incremental update of an 
SVD using optimized algorithms for computing rank one modifications of matrix factorizations [9]. 

In lieu of being able to exactly able to compute component of vt that is orthogonal to our 
current subspace estimate, GROUSE computes this component only on the entries in Qf Recent 
work [1] shows that for generic subspaces, the estimate for r computed by Algorithm [T] in a single 
iteration is an excellent proxy for the amount of energy that lies in the subspace, provided that 
the number of measurements at each time step is greater than the true subspace dimension times a 
logarithmic factor. One can also verify that the GROUSE update rule corresponds to forming the 
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matrix [[/, r] and then truncating the last column of the matrix 



where i?^ denotes the (r + 1) x (r + 1) rotation matrix 



I-|^(l-cos(w)) 
irai sin(r/o-) 




That is, our algorithm computes a mixture of the current subspace estimate and a predicted or- 
thogonal component. This mixture is determined both by the stepsize and the relative energy of 
vt outside the current subspace. 

4 Numerical Experiments 

4.1 Subspace Identification and Tracking 

Static Subspaces We first consider the problem of identifying a fixed subspace. In all of the 
following experiments, the full data dimension is n = 700, the rank of the underlying subspace 
\s d = 10, and the sampling density is 0.17 unless otherwise noted. We generated a series of iid 
vectors vt according to generative model: 



where ?7true is an n x d matrix whose d orthonormal columns spam the subspace, a is a d x 1 vector 
whose entries are realizations of iid A/'(0, 1) random variables, and /3 is an n x 1 vector whose entries 
are iid AA(0,a;^). Here M{Q,uP') denotes the Gaussian distribution with mean zero and variance 
> which governs the SNR of our data. 

We implemented GROUSE (see Algorithm [T] above) with a stepsize rule of rjt := C/t for some 
constant C > 0. Figure [T]^a) shows the steady state error of the tracked subspace with varying 
values of C and the noise variance. All the data points reflect the error performance at t = 14000. 
We see that GROUSE converges for C ranging over an order of magnitude, however with additive 
noise the smaller stepsizes yield smaller errors. When there is no noise, i.e., = 0, the error 
performance is near the level of machine precision and is flat for the whole range of converging 
stepsizes. In Figure [T]^b) we show the number of input vectors after which the algorithm converges 
to an error of less than 10~^. The results are consistent with Figure[l]^a), demonstrating that smaller 
stepsizes in the suitable range take fewer vectors until convergence. We only ran the algorithm up 
to time t = 14000, so the data points for the smallest and the largest few stepsizes only reflect 
that the algorithm did not yet reach the desired error in the allotted time. Figures [T|c) and[T|d) 
repeat identical experiments with a constant stepsize policy, ijt = C. We again see a wide range of 
stepsizes for which GROUSE converges, though the region of stability is narrower in this case. 

We note that the norm of the residual ||r|| provides an excellent indicator for whether tracking 
is successful. A shown in Figure [2]^a), the error to the true subspace is closely approximated by 
II r II /II ft II . This confirms the theoretical analysis in [Ij which proves that this residual norm is an 
accurate estimator of the true subspace error when the number of samples is appropriately large. 



Vt = Utruea + P 
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Figure 1: (a) Performance sensitivity to noise for a diminishing stepsize policy 774 = Cjt. Results are 
displayed for three values of the noise magnitude, w. (b) The number of iterations required to achieve an 
error of 10^^ as a function of C. (c) and (d) are the same except that the stepsize policy is here r\t — C. 



Subspace Change Detection As a first example of GROUSE's ability to adapt to changes in 
the underlying subspace, we simulated a scenario where the underlying subspace abruptly changes 
at three points over the course of an experiment with 14000 observations. At each break, we selected 
a new subspace S uniformly at random and GROUSE was implemented with a constant stepsize. 
As is to be expected, the algorithm is able to re-estimate the new subspace in a time depending on 
the magnitude of the constant stepsize. 

Rotating Subspace In this second synthetic experiment, the subspace evolves according to a 
random ordinary differential equation. Specifically, we sample a skew-symmetric matrix B with 
independent, normally distributed entries and set 

U = BU, U[0] = Uo. 

The resulting subspace at each iteration is thus U[t] = exp{5tB) where (5 is a positive constant. 
The resulting subspace at each iteration is thus U[t] = exp{5tB) where (5 is a positive constant. In 
Figure [sj we show the results of tracking the rotating subspace with 6 = 10^^. To demonstrate the 
effectiveness of the tracking, we display the projection of four random vectors using both the true 
subspace (in blue) and our subspace estimate at that time instant (in red). 
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Figure 2: (a) Comparison of tlie distance to the true subspace and the norm of the residual in Step |4] 
of the GROUSE algorithm. The residual norm closely tracks the distance to the actual subspace. (b) 
Using constant stepsize to track sudden changes in subspace. We plot the transient behavior three constant 
stepsize policies. In (c), we again verify that the norm of the residual gives an accurate signature for anomaly 
detection and for tracking success. 
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Figure 3: Tracking a rotating subspace. Here we plot the norm of the projection of four random vectors 
over time. The blue curves denote the true values of these norms, and the red curves plot the GROUSE 
predictions. Note that except for very early transients, GROUSE fully tracks the subspace. 



Tracking Chlorine Levels We also analyzed the performance of the GROUSE algorithm on 
simulated chlorine level monitoring in a pressurized water delivery system. The data were generated 
using EPANET[] software and were previously analyzed [15] . The input to EPANET is a network 
layout of water pipes and the output has many variables including the chemical levels, one of 
which is the chlorine level. The data we used is available from [T5] This dataset has ambient 
dimension n = 166, and T = 4610 data vectors were collected, once every 5 minutes over 15 days. 
We tracked an d = 6 dimensional subspace and compare this with the best 6-dimensional SVD 
approximation of the entire complete dataset. The results are displayed in Figure |4| The table 
gives the results for various constant stepsizes and various fractions of sampled data. The smallest 
sampling fraction we used was 20%, and for that the best stepsize was 3e-2; we also ran GROUSE 
on the full data, whose best stepsize was 5e-3. As we can see, the performance error improves for 
the smaller stepsize of 5e-3 as the sampling fraction increases; Also the performance error improves 
for the larger stepsize of 3e-2 as the sampling fraction decreases. However for all intermediate 
sampling fractions there are intermediate stepsizes that perform near the best reconstruction error 
of about 0.12. The normalized error of the full data to the best 3-dimensional SVD approximation 
is 0.0704. Note that we only allow for one pass over the data set and yet attain, even with very 
sparse sampling, comparable accuracy to a full batch SVD which has access to all of the data. 

' http : //www, epa.gov/nrnirl/w swrd/ dw/ epan et . html 

^ http : //www. cs . emu. edu/afs/cs/proj ect /spirit- 1/www/ 
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Figure 4: (a) Actual sensor readings, (b) Predicted sensor readings from tracked subspace. In these 
figures we are displaying the values at sensors 4, 17, 148, 158, and 159. The table lists errors in tracking 
the chlorine data set for varying sampling densities and stepsizes. The error between the data and the best 
SVD approximation is 0.0704. 

Figure Qa) and (b) show the original and the GROUSE reconstructions of five of the chlorine 
sensor outputs. We plot the last 500 of the 4310 samples, each reconstructed by the estimated 
subspace at that time instant. 

4.2 Matrix Completion Problems 

As described in Section |2.1[ matrix completion can be thought of as a subspace identification 
problem where we aim to identify the column space of the unknown low-rank matrix. Once the 
column space is identified, we can project the incomplete columns onto that subspace in order to 
complete the matrix. We have examined GROUSE in this context with excellent results. Our 
approach is to do descent on the column vectors in random order, and allow the algorithm to pass 
over those same incomplete columns a few times. 

Our simulation set-up aimed to complete 700 x 700 dimensional matrices of rank 10 sampled 
with density 0.17. We generated the low-rank matrix by generating two factors Yl and Yr with 
i.i.d. Gaussian entries, and added normally distributed noise with variance uj'^. The robustness to 
step-size and time to recovery are shown in Figure [5j 

In Figure [6] we show a comparison of five matrix completion algorithms and GROUSE. Namely, 
we compare to the performance of OPT-SPACE [II], FPCA (H], SVT g], SDPLR [31 US], and 
NNLS [TTj. We downloaded each of these MATLAB codes from the original developer's websites 
when possible. We use the same random matrix model as in Figure [5] GROUSE is faster than all 
other algorithms, and achieves higher quality reconstructions on many instances. We subsequently 
compared against NNLS, the fastest batch method, on very large problems. Both GROUSE and 
NNLS achieved excellent reconstruction, but GROUSE was twice as fast. 

5 Discussion and Future Work 

The inherent simplicity and empirical power of the GROUSE algorithm merit further theoretical 
investigations to determine exactly when it provides consistent estimates. It is possible that an 
adaptation of the analysis of [TT] to this online setting will yield such consistency bounds. The main 
difficulty lies in characterizing the trajectory of the GROUSE algorithm from a random starting 
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Figure 5: (a) Performance sensitivity to noise parameter lo and stepsize for matrix completion. Here we use 
a diminishing stepsize rj — C/t. (b) Here we plot the time to get to a desired Frobenius norm error on the 
hidden matrix. We see that GROUSE converges to the noise floor after at most 10 passes over the columns 
of the matrix. 
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Figure 6: The figure compares five matrix completion algorithms against GROUSE. The table gives a 
comparison of GROUSE and NNLS on large random matrix completion problems. 



subspace. Though we can guarantee that there is a basin of attraction around the global minimum, 
it is not yet clear how to characterize when GROUSE will end up in this basin. 

We would also like to investigate how to adapt step-size in the GROUSE algorithm automatically 
to varying data. This is the only parameter required to run the GROUSE algorithm, and we 
have seen that there are substantial performance gains when this parameter is optimized. Using 
techniques from Least-Squares Estimation, it may be possible to automatically tune the step size 
based on our current error residuals. 
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